W3CActivities

W3C Activity: Real Time Multimedia

This activity statement is part of the W3C activity list.

Table of Contents


Introduction

Since the Web was invented, the target audience and the content of a typical Web site has already changed dramatically. Industry analysts predict another major change in the near future. They expect that Web content will become similar to the content of today's multimedia CD-ROMs or even today's television programs. In other words, they believe that the Web will be turned into a distribution system for both interactive and continuous multimedia content, or real-time multimedia content.

Several different communities are currently working independently on the integration of real-time multimedia into the Internet, namely the Web community, the CD-ROM community and the community working on Internet-based audio/video-on-demand. Given these diverse communities, a lack of a forum for discussion/standardization bears the danger that a plethora of non-interoperable solutions will emerge.

These different solutions will most likely not result from a healthy competition advancing technological progress. In contrast, they will result from a simple lack of communication between these three very different communities. A synergy of their orthogonal expertise holds the promise that a single, sound technical solution can be found for many of the issues of real-time multimedia content on the Web. Such agreements are the necessary signal for independent content providers to start creating real-time multimedia content for the Web, and, thus, for market growth in this area.

W3C has members from all three communities. Representatives from each community participated at the recent W3C workshop on "Real Time Multimedia and the Web". In the feedback we received after this event, members of all communities reported that they see W3C as a promising forum for exchanging ideas and for finding consensus on common solutions for integrating real-time multimedia into the Web.


Support for Creating Real Time Multimedia Content

Requirements

With regard to support for creating real-time multimedia content, both the Web community and the community working on Internet-Based Audio/Video-On-Demand can profit from the experiences gained by the CD-ROM community.

An overlap already exists between the functionality found on the Web and the functionality found in CD-ROM technology. Therefore, W3C work starts from the assumption that real-time multimedia will be integrated into the Web by using extensions and additions to the basic Web technologies, such as HTML, JPEG, GIF and PNG images, image maps and URLs.

This assumption makes W3C different from other groups working on real-time multimedia. For instance, the Java community uses Java byte-code as distribution format for real-time multimedia content, and Java-code generators for producing this content. DAVIC (Digital Audio Video Council) - a standards body for interactive TV - uses MHEG-5 as their base format, and uses only a subset of HTML for implementing hypertext functionality.


Products

Declarative Format for Time Representation and Media Synchronization

Web technology is limited today when it comes to creating continuous multimedia presentations. For these applications, content authors need to express things like "five minutes into the presentation, show image X and keep it on the screen for ten seconds". More generally speaking, there must be a way to describe the synchronization between the different media that make up a continuous multimedia presentation.

On the Web, media synchronization is currently expressed by using a scripting language such as JavaScript or VisualBasic. However, scripting has a set of well-known disadvantages. First, script-based content is often hard to produce and maintain. Moreover, it is hard to build search engines, conversion tools and other automated tools for scripting languages.

To address these disadvantages, CD-ROM technology uses declarative formats such as Apple's Quicktime as an alternative approach to scripting languages such as Macromedia's Lingo or Apple's Hypercard. In a declarative language, the "events" in a multimedia presentation are simply expressed by using a time line, instead of writing a script program. Both approaches have succeeded in the CD-ROM marketplace, and are often used side-by-side in a particular multimedia product.

Coming up with a solution requires an analysis of standards such as MHEG or HyTime. However, both technologies have been criticized for being complex, and up to now have failed to gain widespread market acceptance. The Quicktime file format appears to be a good starting point, since it is freely available and implemented on a wide range of platforms. Constraint-based approaches are another interesting area that should be further evaluated.

Controlling Audio/Video Replay

Simple audio/video streams integrated into a Web page can be presented in different ways. For example, audio can be used as background, e.g. in form of a continuous "loop", or the user can be given control over the replay in form of a user interface containing a number of buttons labeled "play", "rewind", and so forth.
At the moment, there is no standardized way to control audio/video replay when writing a Web page. One possible approach is to define standard attributes for the "OBJECT" tag. More generally, a standard API for letting a web content author control audio/video replay could be provided (see W3C Activity on HTML).

Embedding URL's into Audio/Video Streams

Just like hypertext, audio and video presentations can be enhanced by hyperlinks, i.e. by giving the user the opportunity to download additional information about the audio/video output at a particular point in time. An example seen in some academic and commercial products is a video that displays a rectangle surrounding a certain object. Clicking into the rectangle results in following the link associated with the respective object. This is similar to the way image maps work for still images.

Addressing Subparts of Audio/Video Files via URL's

Random access to large audio and video files is often useful. This would for example allow the implementation of "edit-lists" on the Web, a feature known from many digital audio/video editing systems. Using such URL-based edit-lists, new audio and video content could be created by referencing subsections of existing audio/video files stored on the Web. These subsections (or "clips") must be addressable via URLs that describe a time range, e.g. "the clip in audio file F starting at minute 5 and ending at minute 10".

A Common Fallback-Format for Audio/Video Data

Many companies competing in the market for audio/video tools are currently differentiating their products by using a proprietary codec (coder/decoder). Existing standard audio and video formats were not originally designed for the Internet, and thus do not cope well with its specific problems (e.g. packet loss, variable bandwidth). Allowing competition to find a good solution in this area appears beneficial, and a standard could be premature.

However, it would still be useful to establish one or several common fallback-codecs that could be used to achieve interoperability between tools from competing vendors. Such "Internet-suitable" audio/video codecs could be developed following a similar model as for the PNG image format. In the case of PNG, an independent group of developers chose W3C as the organization for distributing and maintaining their "patent-free" standard.

Recommendations for other Data Formats used on CD-ROMs

This includes for example formats for synthesized sound like MIDI and formats for sprite-based animation.


Support for Network Transmission of Real Time Multimedia

Requirements

With regard to transmitting real-time multimedia content over the Internet, both the Web community and the CD-ROM community can profit from the experiences gained by the community working on Internet-based Audio/Video-On-Demand.

Today, Web content is primarily transported over the Internet. Therefore, W3C work on support for transmitting real-time multimedia focuses on this network. Other groups working on real-time multimedia follow different assumptions. For instance, DAVIC recommendations are primarily targeted to distribution networks which are based on ATM and MPEG system streams, and use cable and satellites systems for delivering interactive TV content.


Products

Real-Time Streaming Protocol (RTSP)

This protocol is currently being discussed by the IETF. RTSP provides methods to realize commands similar to the functionality provided by a CD player, such as play, fast-forward, pause, stop and record. The RTSP version (RTSP' or RTSP prime) that is currently most likely to be accepted by the IETF shares many properties with HTTP. The motivation is to reuse technology that has been developed for HTTP (caching of content, authentication, encryption, PICS, JEPI) when accessing real-time multimedia content. W3C is thus carefully tracking the development of RTSP.

Addressing "Streaming" Audio/Video resources via URL's

Switching between HTTP and a real-time protocol currently requires one additional network round-trip time in order to retrieve a "session description file" or "metafile". Usually, this file is then passed to a helper application. The extra round-trip for retrieving the description file increases network delay, and thus the response time observed by the user. Moreover, configuring helper applications and MIME-types is cumbersome for the end-user. This calls for a way to address streaming resources more directly via a URL.

Application Level Framing for Web Data Formats

The IETF has developed RTP (Real Time Transport Protocol) as the standard for carrying data for real-time multimedia applications over the Internet. This protocol is also used to transport audio and video in the H.323 conferencing standard.

One of the design principles behind RTP is that real-time data should be split into packets in such a way that each packet can be processed independently by the receiver application (application-level framing).

This greatly facilitates synchronizing real-time multimedia streams when there is packet loss on the Internet. As an example, consider transmitting an HTML page that is synchronized with an audio stream. With application-level framing, packet loss in the HTML transmission will not lead to an interruption in the real-time output, but only to a "hole" in the displayed HTML page.

Application level framing also facilitates multicasting of Web content. In general, multicasting can alleviate the problem of overloaded servers, and save Internet bandwidth. One way to use multicast on the Web is to send very popular content to an infrastructure of servers (see the W3C Activity on Replication and Caching). The broadcast model of television channels suggests a different approach, namely delivering popular content directly to an end-user's browser via multicast. Prototype applications for this purpose already exist, e.g. mMosaic or Shared Mosaic. However, these applications do not use application level framing. Thus, they are only capable of solving the problem of server overload. They do not necessarily save Internet bandwidth.

Packetization schemes for HTML that allow displaying HTML using application level framing techniques can be derived from similar techniques that have been developed for shared editing of SGML documents. A similar packetization scheme is required for GIF images. For JPEG images, it should be possible to reuse the existing RTP payload format for MJPEG ("moving JPEG").


Current Situation

Analysis

Potential W3C products in the area of real-time multimedia are:


Achievements


Next Step

"Call for Interest" for declarative format that allows describing the synchronization of different real-time multimedia streams. Interest in W3C taking on this work has been expressed by many participants of the W3C workshop on "Real- Time Multimedia and the Web".


Acknowledgments

Many of the ideas contained in this document are based on presentations and discussions at the W3C workshop on "Real-time multimedia and the Web". Special thanks goes to the workshop participants that replied to our request and gave us extensive written feedback on the future directions of W3C work:


Enter your e-mail address to receive e-mail (courtesy of NetMind) when this page is updated
W3C
W3C team at INRIA, Ed. Philipp Hoschka 960531 Webmaster
Created May 1996 , Last Change: January 22, 1997